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Abstract — The One-Class Classifier (OCC) has been widely 
used for solving the one-class and multi-class classification 
problems. Its main advantage for multi-class is offering an open 
system and therefore allows easily extending new classes without 
retraining OCCs. However, extending the OCC to the multi-class 
classification achieves less accuracy comparatively to other multi- 
class classifiers. Hence, in order to improve the accuracy and 
keep the offered advantage we propose in this paper a Multiple 
Classifier System (MCS), which is composed of different types of 
OCC. Usually, the combination is performed using fixed or 
trained rules. Generally, the static weighted average is considered 
as straightforward combination rule. In this paper we propose a 
dynamic weighted average rule that ealculates the appropriate 
weights for each test sample. Experimental results conducted on 
several real-world datasets proves the effective use of the 
proposed multiple classifier system where the dynamic weighted 
average rule achieves the best results for most datasets versus the 
mean, max, product and the static weighted average rules. 

Keywords — one class classifier, multiple classifier system, 
multi-class classif ication, dynamic weighted average rule. 

I. Introduction 

One Class Classifier (OCC) has been designed for training 
only patterns belonging to the target class distribution. Its main 
goal is to detect anomaly or a state other than the one for the 
target class [1], [2]. The assumed hypothesis is that only 
information of the target class is available. Therefore no 
information about the potential nature of other classes is 
needed to derive the decision boundary. 

In the last decades, OCC has attractive much attention for 
many researchers leading to use it for solving the multi-class 
classification problem [3], [4], [5], [6]. Indeed, extending the 
classifier to new classes does not require retraining it again on 
all classes. In addition, the OCC trains only on the target class 
that allows avoiding the unbalanced data. This is usually 
appears when the training data of the target class are 
significantly outnumbered by the other training instances. In 
this case, separating the target class among the remaining 
classes is a difficult task. 

However, using OCC for the multi-class classification 
usually achieves less accuracy than the usual multi-class 
classifiers [5]. Furthermore, due to the high diversity of 
existing OCC [7] choosing a specific classifier for various 
applications is a difficult task. Therefore, combining different 
OCCs is suitable since it can produce a better system in terms 



of robustness and accuracy. In addition, it allows keeping the 
offered advantage for achieving an extensible multi-class 
system. In this case, the most difficult problem is finding the 
best combination rule. 

In order to perform the combination, a Multiple Classifier 
System (MCS) of diverse classifier must be created, for which 
different ways are possible. The most popular ways are based 
on different initialization, different parameter choices, 
different architectures, different classifiers, different training 
sets or different feature sets [8]. In the field of combining 
OCCs for achieving a multi-class classification system, 
Juszczak et al. [9] use Parzen OCC ensembles for classifying 
missing data in multi-class problems. In a related work, 
Munoz-Marf et al. [10] demonstrate that using a simple 
combination rules (e.g. average or product) to combine OCCs 
trained on different feature sets are able to improve the 
classification accuracy for classifying image remote sensing. 
More recently, Abbas et al. [ 11] used the Dezert-Smarandache 
theory to achieve one-class support vector machine (OC- 
SVM) ensemble, trained on different feature sets for 
handwritten digits recognition. 

The previous methods are based on using the same type of 
OCC trained on different feature sets or by different training 
sets to yield complementary OCCs. However, different feature 
sets are not always available. Moreover, in some applications, 
training samples are often reduced, which does not allow 
generating different training sets. 

Hence, we propose in this paper to use an alternative 
approach, which relies on creating a multiple one-class 
classifier that is achieved by combining different types of 
OCC, trained on the same feature set by the same training set 
for solving the multi-class classification problem. The 
combination step is performed through the use of fixed and 
trained rules. For the latter rule, in addition to the widely used 
weighted average that has been investigated for classifiers 
combination [12], [13], [14], we propose a dynamic weighted 
average rule to measure the importance of the used classifiers 
through calculating the suitable weights for each test sample. 

The remaining of this paper is organized as follows. In 
section 2, we review the used OCCs and their extension to the 
multi-class classification. Section 3 describes the propose MCS 
based on OCCs. In order to evaluate the effective use of the 
proposed approach, experimental results conducted on various 
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datasets are presented in section 4. Finally, the conclusion is 
provided in the last section. 

II. Overview of One-Class Classifiers for 
Multi-Class classification 

Different algorithms are addressed for designing one class 
classification. In this work, three types of OCC are selected, 
which are the One-Class Nearest Neighbor (OC-NN ), one-class 
neural network which is usually referred to as auto-encoder and 
also as Auto-Associative Neural Networks (AANN) and One- 
Class Support Vector Machine (OC-SVM). In the following, 
we briefly review properties of the used OCC and their 
extension for multi-class classification. 

A. One-Class Nearest Neighbor Classifier 

The one-class nearest neighbor (OC-NN) is a particular 
case of the OC-KNN, such K is set to one. According to [1,7], 
OC-NN finds the distance of a test object x to its nearest 
neighbor in the training set, and the method estimates the 
density as: 

= v(\\x-NN tr (x)\\) (1) 

Where n is the number of training samples and V defines 
the smallest volume value with the centre in x surrounding the 
observation vector nearest to x. 

A test sample x may either be rejected as being an outlier, 
or accepted as being part of the target class according to the 
threshold value defined in the training step. 

B. Auto-Associative Neural Network Classifier 

Neural networks are composed of interconnected 
processing units arranged in one or several layers that can be 
used to implement a complex functional mapping between 
input and output variables. The weights of the neural network 
are adjusted using training data so that an error function would 
be minimized over the training set. 

The basic design of the AANN is “bottleneck”. Which 
assumes that the data represented in an p-dimensional space is 
mapped to less dimension and then reproduced for testing the 
reproduction ability of the model. Usually, the AANN is 
composed of three layers is designed having p inputs, p 
outputs and k neurons on the hidden layer, where k < p. The 
AANN is then trained using the standard back-propagation 
algorithm to learn the identity function over the training set. 
This design has been used successively by Cottrell and Zipser 

[15] to produce a compression algorithm and Japkowicz et al. 

[16] for novelty detection. 

Let S defines a training set S = {x 1( ...,x n }, the AANN is 
trained on each sample in order to produce an identity function 
/ that assigns for each input x ; an output /(X;), i = 1, ... ,n 
taking ideally the following form: 

/Oi) = x t (2) 

The principle of the AANN is to adjust its weights 
according to the error of reconstruction, which is defined as 



the distance between output and its corresponding input. 
Formally, the error of reconstruction is defined as: 

Er{x ) = || /(x)- *11 (3) 

Such that, x e S 

A test sample x may either be rejected or accepted 
according to the threshold value defined in the training step. 

C. One-Class Support Vector Machine Classifier 

The concept of the OC-SVM consists to find an hyper 
sphere in which the most of training samples are included into 
a minimum volume. More specifically, the objective of the 
OC-SVM is to estimate a function /(x) that encloses the most 
of training samples into a hyper sphere R x = {x E 
R d \/(x) > 0} with a minimum volume where d is the size 
of feature vector [17]. /(x) is the decision function, which 



takes the form as: 

f{x) = sgn[Zf =1 a t K{x, X;) - p ) (4) 

Denoting by the Lagrange multipliers computed by 
optimizing the following equations: 

min a [i aiCtj K^xf} (5) 

Subject to 0 < a i < (6) 

2 :>t = i (V) 



p defines the distance of the hyper sphere from the origin, v 
is the percentage of data considered as outliers. K {., .) defines 
the OC-SVM kernel that allows projecting data from the 
original space to the feature space. 

A pattern x is then accepted when /(x) > 0. Otherwise, it 
is rejected. Various kernel functions can be used as 
polynomial or Radial Basis Function or multilayer perceptron 
[18]. Usually, the RBF is the most used kernel, which allows 
determining the radius of the hyper sphere according to the 
parameter y. It is defined by: 

K(x,xf) = exp(-y (d(x,Xj))) (8) 

Such that, d(x, Xj) = ||x — X;|| 2 (9) 

The extension of the OC-SVM to the multi class is 
proposed by [3], where they use a logarithmic function for 
calibrating the outputs which is defined as follows: 

d(x, x j) = — log(jfl =1 ct; K(x, X;)) + log{p ) (10) 
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Fig. 1 . The proposed MCS architecture 



D. Extension of OCCs for Multi-Class Classifiaction 

Extending the OCC to the multi-class is straightforward. 
Since, for a defined set of m classes C = {c 1 , ... , c m }, each 
class has its corresponding OCC. After achieving the OCC 
model of each class, a test sample is assigned to its 
corresponding class that generates the maximum prediction. 
The class label y{x) of a test sample x is defined 
mathematically as follows: 

y{x ) = arg max (oCCj(x)^), with; = 1, ... ,m (11) 

Such OCCj is the one of the OCCs (OC-NN, A ANN or 
OC-SVM). 

III. One-Class classifiers combinatiion for 

MULTI-CLASS CLASSIFICATION 
The basic structure of the proposed MCS based on OCC is 
depicted in figure 1. The achieved MCS is composed of three 
different types of OCC, which operate in parallel at the same 
data. A description of each stage of the MCS is given in the 
following sections. 

A. Normalization of OCC Outputs 

Several combination rules are possible to achieve the 
MCS, but all these rules assume a unique interpretation of the 
confidence values as a posteriori probabilities for each test 
sample x. Hence, transformation by means of the 
normalization of each classifier outputs into posteriori 
probability is required for performing correctly the 
combination. 



Thus, we propose to use the softmax normalization 
method [19] which is adopted due to its simplicity and 
effectiveness. This function allows mapping the outputs in the 
range [0, 1]. It is used for the three classifiers as follows: 



• For the OC-NN, the density dnrij response of each 
class Cj from m classes are transformed to P(cj/x') 
through: 



PiCfj/x) = 



exp(dnn j(x)j 
TijL j expfrtnn j(x)j 



(12) 



• For the AANN, the reconstruction error Erj of each 
class Cj are transformed as follows: 






(13) 



• For the OC-SVM outputs dj of each class Cj are 
transformed by the following equation: 



P 3 (cj/x) = 



exp (dj(x,x j)) 
2™ 1 exp(dj(x,x j)) 



(14) 



B. Combination Rules 

Various combination rules are possible for achieving an 
enhanced MCS. In our case, two groups are considered which 
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are based on fixed [8] and trained combination rules, 
respectively. 

The MCS is defined as a set of L of classifiers which are 
combined through the following rules: 

1 ) Fixed Combination Rules 

a) Mean combination rule: 

y(x) = arg max (1/L^ =1 Pi ( Cj/x)),j = 1, (16) 

b) Product combination rule: 



sample according to the classifier maximum response kept 
from the validation step. 

More precisely, when the response of the classifier (i.e. the 
maximum value of all classes) is near to its maximum 
response, we assign a high contribution is assigned to the 
classifier comparatively versus other classifiers. In contrast, 
when the response of the classifier is less than its maximum 
response, the contribution will be minimized. 

Denote wf the weight assigned to the classifier i of a test 
sample x, and denote by pl nax the maximum response of the 
classifier i calculated in the validation step, the weight 
assigned to each sample of the i th classifier is defined as: 



y(x ) = argmax(Tli=iPi ( c } /x)),j = 1, (17) 

c) Max combination rule: 

y(x) = arg max (max (P; (c,/x))),/ = 1, ...,m. (18) 

2 ) Trained combination rules 

a) Static weighted average combination rule (SWA): 

This rule has been widely used for combining classifiers [8], 
[12], [13], [14], In our case weights are assigned to each 
individual classifier which represents the importance of each 
one for achieving the MCS. However, this importance is 
assumed the same for all test samples. Indeed, the individual 
classifiers can differ from each other in terms of performance 
which is measured using a validation dataset. The obtained 
results can be used for selecting the weights which will be 
assigned to the individual classifiers 

This method relies on the average error rates (AERs) of 
classifiers which are calculated in the validation step. Denote 
the AER of the classifier i as r ; , i = 1, ...,L where L defines 
the number of classifier. Then, the weight w ; assigned to 
classifier i is calculated as follows: 



Such that. 




0 < W; <1 and Y.t=i w t = 1. 



(19) 



The class label y(x) of a test sample x is defined 
mathematically as follows: 



max (P i (c j /x',ypr nax 
l \ =1 max (PiiCj/xV/P? 



Such that. 



j = 1, ...,m 


(21) 


%i < = 1 


(22) 



The class label y(x) of a test sample x is defined 
mathematically as follows: 



y(x) = arg max(£i=iW?Pi (cj/x)),j = 1, ...,m. (23) 



Finally, a test sample x is assigned to the corresponding 
class when the Combined-OCC provides the maximum 
prediction value. 

As each class is presented by Combined-OCC, adding a 
new class to the system does not require retraining it again on 
all classes, but it needs only adding new OCCs, where each 
OCC is trained on the data of the new class. 

IV. Experimental Results 
A. Dataset Description 

For evaluating the proposed MCS, four datasets are 
selected from ELENA project [20], which represent real 
applications: iris, phoneme, satimage and texture. In addition, 
we use Breast cancer [21], Crab gender [22], and handwritten 
digits [23] datasets. All these datasets are summarized in Table 
I. 

We randomly partition each dataset into three subsets as 
they are reported in table I. The first subset is used for training 
the classifiers, the second subset is used for selecting the 
optimal parameters of each classifier, and the last subset is 
used for testing the MCS. 



y(x) = arg maxCZ^WiPi (c,/x))J = 1, ...,m. (20) 

b) Dynamic weighted average combination rule (DWA): 
The main drawback of the SWA is that all samples are 
weighted by the same values. This approach is not efficient 
since each sample has its own importance. Indeed, each 
classifier allows recognizing well a set of samples which is not 
well recognized by the others. Therefore, the contribution of 
each classifier for each test sample must be calculated. 

In order to overcome this drawback, we propose to 
calculate the weight assigned to each classifier for each test 



B. Tuning of Parameters for Multiple Classifier System 

The MCS is composed of three classifiers which are 
trained separately on the same feature set using the same 
training set. However, each classifier has its own parameters 
which must be tuned. 

For the OC-NN classifier, no parameters are required to be 
tuned unlike to other classifiers (AANN and OC-SVM). 

The AANN has two parameters to be tuned which are the 
number of epochs and the number of nodes. In order to select 
the optimal parameters for each class the AANN is trained on 
the training dataset with different parameter values. Then the 
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optimal parameters are selected when the AANN achieves the 
best reconstruction of the validation dataset. 

The OC-SVM has also two parameters (v, y), which are 
fixed for each class through the training and validation steps. 
In the training step, different couples of parameters are 
generated in order to achieve the best recognition rate of the 
training dataset. Validation step has been done in order to 
select the couple of parameters which provides the highest 
recognition rate. 

C. Results and Descusion 

Results for the individual classifiers and MCS with 
different combination rules conducted on the used datasets are 
reported in Table II. Firstly, we can note that combining the 
achieved MCS allows improving the recognition rates than the 
best single system for all datasets. Secondly, when observing 
carefully results, we can note that DWA allows improving the 
recognition rates whatever the selected application. In 
contrast, the remaining combination rules (Mean, Max, Prod, 
SWA) depend on the selected dataset. For instance, Mean and 
Prod rules are suitable for Iris, texture and Breast Cancer 
datasets, respectively. In contrast, the OC-NN classifier is well 
suitable for all applications except the Breast Cancer dataset, 
which requires the use AANN. 

It is interesting to note that when using the Satimage 
dataset, the DWA is the only combinations rule that allows 
improving the recognition rate. This proves the usefulness of 
weighting each classifier dynamically according to its 
response value. 



V. Conclusion 

This paper aims to propose a new MCS for solving the 
multi-class classification problem based on OCCs. This MCS 
is composed of different types of OCC trained in the same 
feature space using the same trained set. In order to combine 
classifiers, fixed and trained combination rules are compared 
for finding the most suitable rule. 

Experimental results conducted on several real-world 
datasets show that the proposed MCS achieves better results 
than the best individual classifier, when using the dynamic 
weighted average against the mean, max, product and static 
weighted average rules. 

It is clear that combining all classifiers is not necessary for 
all datasets. Hence, the extension of this work consists to 
select for each class the most suitable OCCs which lead to 
achieve a robust MCS. This work is considered as challenging 
task which is decomposed into two main problems. The first 
problem is to find the best diversity measure for selecting the 
suitable classifiers for each class. The second problem is the 
need to a calibration function for calibrating outputs 
originating from different types of classifiers for each class. 



Table I. Datasets Used for Evaluating the Proposed MCS 



Dataset 


# Classes 


# Features 


# Training samples 


# Validation 
samples 


# Test samples 


Phoneme 


2 


5 


200 


200 


5004 


Iris 


3 


4 


45 


45 


60 


Texture 


11 


40 


732 


731 


4037 


Satimage 


6 


36 


600 


600 


5235 


Breast Cancer 


2 


9 


234 


233 


232 


Crab 


2 


6 


66 


66 


68 


Digits (USPS) 


10 


14 


3646 


3641 


2007 



Table II. Classifiaction Accuracy of Indvidual Classifier and Combination Methods 



Classifier Combination rules 

Dataset 
Accuracy (%) 





OC-NN 


AANN 


OC-SVM 


Mean 


Max 


Prod 


SWA 


DWA 


Phoneme 


80.94 


73.20 


75.50 


80.98 


80.97 


80.97 


80.99 


81.63 


Iris 


95.00 


70.00 


90.00 


96.67 


90.00 


96.67 


96.67 


96.67 


Texture 


97.08 


91.01 


93.39 


97.18 


91.29 


97.18 


97.18 


97.18 


Satimage 


86.77 


70.26 


78.61 


85.88 


70.35 


86.32 


84.05 


87.46 


Breast Cancer 


96.26 


96.41 


95.65 


98.44 


98.08 


98.44 


98.44 


98.44 


Crab Gender 


83.82 


82.35 


82.35 


85.29 


85.29 


85.29 


86.76 


88.24 


Digits (USPS) 


81.56 


79.47 


71.20 


82.71 


79.92 


82.61 


82.61 


83.01 
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